GitHub Repository: https://github.com/mtdunphy-umd/SURV727-Final-Project
Since 2010 and the controversial decision of Citizens United vs. Federal Election Commission (FEC), the amount of money spent on political campaigns has risen exponentially. In 2010, a midterm election where infamously Republicans picked up a half dozen seats in the Senate and more than 60 seats in the House, political expenditures totaled around 3.6 billion dollars. In 2022, that number is projected to be around $8.9B.
Candidates with more money win more often, so fundraising has become a critical indicator of election forecasting and understanding the current political climate.
Major partisan events that dominate the greater political discussion often cause a mass influx of money to political campaigns. With each fundraising cycle there are candidates that perform above and below expectations based on candidate quality and the overall political climate. Our goal is to identify those trends in contributions that correlate with significant political shocks during the 2022 midterm cycle, as well as identify characteristics of candidates that affected fundraising performance. In our analysis, we build 3 linear models to identify which characteristics were significant in fundraising among house candidates, senate candidates, and candidates running in competitive districts. We also build 3 logistic models predicting change of win in the general election among the same groups - controlling for the characteristics of their districts. We conduct principal component analysis to identify which variables are the most predictive for our model. We also utilize data visualizations such as comparing top fundraisers by party and office. We also plot fundraising numbers over time through the 2022 election year by day and week.
Data collected for this analysis came from multiple sources. To start, we collected data from 538’s “2022 Primary Project” GitHub repository which tracked the 2022 midterm primary candidates at the federal level. We processed this data to extract general election candidates by filtering candidates based on whether or not they won their primary election (or if there was one, their runoff election). 900 candidates were pulled with information on endorsements, incumbency status, race, and gender of the candidates.
Contributions to candidates were provided by the FEC which tracks federal level campaign finance information. We tried using the FEC API, relying on the R.openFEC package in R, but were unsuccessful in processing the data in a reasonable time, and thus shifted methods to pulling data straight from the FEC website. This data included ‘Candidate-committee linkages’ and ‘Contributions by individuals’ on the bulk data webpage. The 538 general election candidate data set was joined with the candidate-committee linkages FEC data set by office, state, district, and candidate last name to get FEC candidate IDs and committee IDs that will be used later on into the analysis.
Individual contribution bulk data was brought into R and found to have discrepancies and errors. Due to this, we manually pulled individual state receipts for the year of 2022 using custom filters and unioned together in R. Receipt data is up to the last FEC report date which was 10/19/2022. This data was joined with our candidate data set on candidate ID and committee ID to view total fundraising numbers for each candidate as well as the timeline of contributions.
Google trends data was provided by Google, using the gtrendsR R package to identify key issues that were of high salience in this election cycle. We wanted to focus on the issue of abortion and Trump’s influence which is why we searched the following terms: ‘abortion’, ‘supreme court’, ‘Trump’, ‘FBI’, and ‘crime’. Crime was included as a comparison as it was another major issue for many voters this cycle.
District level data was provided by Dave’s Redistricting App (DRA) for each state. This data was manually collected for each state and inputted into this google sheet. The sources of data and processing done to calculate these values for districts can be viewed on DRA’s website. Data that was used included the percentage of white voters in 2020, the 2020 presidential vote share for each party, and the composite score for each party for each district. Composite scores are the mean share of the votes for presidential, senate, governor, and attorney general races from 2016 to 2020 for each party (2016 to 2021 for districts in New Jersey and Virginia). Only Utah’s districts did not have composite score data.
Election result data was manually collected for the New York Times Midterm Tracker for house and senate candidates and inputted into this google sheet. During this process, we identified discrepancies between our candidate data and the information shown on NYT (a deceased house incumbent and districts that had no opponent). We used this information to filter districts in our final data set used for analysis.
# load 538 primary candidates
dem <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/primary-project-2022/dem_candidates.csv")
rep <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/primary-project-2022/rep_candidates.csv")
# District data provided by Dave's Redistricting App https://davesredistricting.org/maps#home
# Each state's congressional district data was manually exported and combined into a csv
dra_house <- read.csv(paste(wd, "/input data/", "DRA Data - House Districts .csv", sep=''))
dra_senate <- read.csv(paste(wd, "/input data/", "DRA Data - Senate Districts.csv", sep=''))
# Collecting data from the FEC
# candidate data for the 2021-2022 cycle pulled from here: https://www.fec.gov/data/browse-data/?tab=bulk-data
# description of file: https://www.fec.gov/campaign-finance-data/all-candidates-file-description/
header <- "CAND_ID|CAND_NAME|CAND_ICI|PTY_CD|CAND_PTY_AFFILIATION|TTL_RECEIPTS|TRANS_FROM_AUTH|TTL_DISB|TRANS_TO_AUTH|COH_BOP|COH_COP|CAND_CONTRIB|CAND_LOANS|OTHER_LOANS|CAND_LOAN_REPAY|OTHER_LOAN_REPAY|DEBTS_OWED_BY|TTL_INDIV_CONTRIB|CAND_OFFICE_ST|CAND_OFFICE_DISTRICT|SPEC_ELECTION|PRIM_ELECTION|RUN_ELECTION|GEN_ELECTION|GEN_ELECTION_PRECENT|OTHER_POL_CMTE_CONTRIB|POL_PTY_CONTRIB|CVG_END_DT|INDIV_REFUNDS|CMTE_REFUNDS"
base <- toString(read_file(paste(wd, "/input data/", "weball22.txt", sep=''))[1])
init <- file(paste(wd, "/output data/", "weball22_header.txt", sep=''))
writeLines(paste(append(header, base), sep = "|"), init)
close(init)
fec_candidate_info <- read.table(paste(wd, "/output data/", "weball22_header.txt", sep=''), sep= "|", header=TRUE)
# fec data has some misaligned columns using the read.table function. Data was imported to google sheets and manually modified to get columns in the correct order.
# google sheet: https://docs.google.com/spreadsheets/d/150dhkj1xrFwfi43ouYqu0LFLcj4jRMetXSRucYLTDIk/edit?usp=sharing
fec_candidate_info_fixed <- read.csv(paste(wd, "/input data/", "FEC Candidate Data - Fixed.csv", sep=''))
fec_candidate_info_fixed$FEC_index <- row.names(fec_candidate_info_fixed)
# contributions by individuals downloaded in bulk from here: https://www.fec.gov/data/browse-data/?tab=bulk-data
# description of file: https://www.fec.gov/campaign-finance-data/contributions-individuals-file-description/
# the file was too large to be imported into the github repository
# the file can be found in this google folder: https://drive.google.com/drive/folders/172eM7HDJ1CMVMPgpaN74dY3bzY9piSsz?usp=sharing
# the folder can be added to the input folder in the repository to reproduce the results
# added header info to the top of the intcont.txt file
fec_receipts <- read.table(paste(wd, "/input data/indiv22/", "itcont.txt", sep=''), sep= "|", header=TRUE, fill=TRUE)
write.csv(fec_receipts, paste(wd, "/output data/", "fec_receipts.csv", sep=''))
# the bulk data set showed discrepancies and errors so the data was manually pulled by individual states and joined together.
# the files were too large to be imported into the github repository
# the file can be found in this google folder: https://drive.google.com/drive/folders/10LzNSv9ucwlUsFGJbJP6nx2qP5cVkLFJ?usp=sharing
# the folder can be added to the input folder in the repository to reproduce the results
folder <- paste(wd, "/input data/FEC Receipts", sep='')
csv_files <- list.files(folder, pattern = "*.csv")
df_list <- list()
for (file in csv_files) {
df <- read_csv(file.path(folder, file), show_col_types = FALSE) %>%
mutate_all(as.character)
df_list[[file]] <- df
}
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## New names:
## • `committee_name` -> `committee_name...2`
## • `committee_name` -> `committee_name...9`
fec_receipts_combined <- dplyr::bind_rows(df_list)
write.csv(fec_receipts_combined, paste(wd, "/output data/", "fec_receipts_combined.csv", sep=''))
# to link candidates to their committees, FEC provides a bulk data set that can be found here under 'Candidate-committee linkages': https://www.fec.gov/data/browse-data/?tab=bulk-data
# a description of the file can be found here: https://www.fec.gov/campaign-finance-data/candidate-committee-linkage-file-description/
# the ccl.txt file was modified to include the header: ccl_header_file.csv
fec_candidate_committees <- read.table(paste(wd, "/input data/", "ccl.txt", sep=''), sep= "|", header=TRUE, fill=TRUE)
# select candidate id, committee id, linkage id
fec_candidate_committees_trimmed <- fec_candidate_committees %>%
select(CAND_ID, CMTE_ID, LINKAGE_ID)
write.csv(fec_candidate_committees_trimmed, paste(wd, "/output data/", "fec_candidate_committees_trimmed.csv", sep=''))
# Election results were collected manually off of the NYT Election tracker: https://www.nytimes.com/interactive/2022/11/08/us/elections/results-senate.html?action=click&pgtype=Article&state=default&module=election-results&context=election_recirc®ion=NavBar
# Data was collected and inputed into this spreadsheet: https://docs.google.com/spreadsheets/d/1azMpRjQ9sRgW_ULf6qvpouKnwAHWRi_eY1U-RnLPKjI/edit?usp=sharing
nyt_election_results <- read.csv(paste(wd, "/input data/", "NYT Election Tracker Data - General Election Candidates.csv", sep='')) %>%
select(State, Office, District, Candidate, Winner.)
All data processing can be viewed in R markdown file.
This section presents the main results.
In this section, we conduct exploratory data analysis to better understand the data. This includes visualizing keyword trends using google trends, plotting fundraising amounts overtime, comparing top fundraising candidates for each party, and compare origin and destination contribution amounts by state.
# take filtered data and join it with receipt data to view contributions over time
final_receipt_date <- left_join(filtered_data, fec_receipts_combined.trimmed, by = c("CMTE_ID" = "committee_id")) %>%
select(Candidate, State, State_processed, District, Party, `2020.White.%`, Pres_2020, Comp, `Dem-Rep.2020`, `Dem-Rep.Comp`, Gender.Num, White.Num, Black.Num, Asian.Num, Latino.Num, Middle_Eastern.Num, Native_American.Num, Incumbent.Num, Trump.Num, Party.Committee.Num, Emily.s.List.Num, Maggie.s.List.Num, Sanders.Num, Renew.America.Num, Winner.Num, Senator.Num, CAND_ID, CMTE_ID, transaction_id, contributor_state, contributor_zip, contributor_id, contribution_receipt_date, contribution_receipt_amount, contributor_aggregate_ytd)
final_receipt_date$contribution_receipt_amount[is.na(final_receipt_date$contribution_receipt_amount)] <- 0
head(final_receipt_date)
write.csv(final_receipt_date, paste(wd, "/output data/", "final_receipt_date.csv", sep=''))
# filter data for representatives and senators
final_house <- final %>%
filter(Senator.Num == 0) %>%
select(-Senator.Num)
head(final_house)
final_senate <- final %>%
filter(Senator.Num == 1) %>%
select(-District, -Senator.Num)
head(final_senate)
# filter data for competitive districts
# competitive districts defined as districts with + or - 10 ppt difference in 2020 presidential vote share (DEM - GOP)
final_competitive <- final %>%
filter(abs(`Dem-Rep.2020`) <= .1) %>%
select(-District)
head(final_competitive)
require(gtrendsR)
Gtrenddata2022 <- gtrends(c("Abortion", "Trump", "Supreme Court", "Crime", "FBI"),
geo = "US", time = "2022-01-01 2022-12-01",onlyInterest = TRUE)
plot(Gtrenddata2022)
Fig. 1 tracks the number of Google searches related to the terms ‘Abortion’, ‘Crime’, ‘FBI’, ‘Supreme Court’, and ‘Trump’ over the last year. These terms generally reflect the issues that were salient during the 2022 election. Abortion and Trump were most prevalent in the public’s mind reflected by the most US Google search hits. Both Abortion and Supreme Court spike around the time of the Dobb’s decision that overturned Roe v. Wade. The spike in both FBI and Trump coincides with the Mar-a-Lago raid in early September. Crime was another major issue with voters in this election, yet we don’t see large changes in the amount people searched on Google for crime related topics. We also tested ‘economy’, and it had a similar trend as ‘crime’, but with less hits.
# cumulative sum of contributions
final_receipt_date_party_cumulative <- final_receipt_date %>%
group_by(Party, contribution_receipt_date) %>%
summarise(fundraised = sum(as.double(contribution_receipt_amount))) %>%
mutate(cumulative_fundraised = cumsum(fundraised))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_cumulative, aes(x = as.Date(contribution_receipt_date)), y = cumulative_fundraised, color = Party) +
geom_line(aes(y = cumulative_fundraised, color = Party)) +
scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
labs(title = "Fig. 2 Cummulative Sum of Individual Contributions for Federal Candidates of \nEach Party",
x = "Day of Contribution",
y = "Contribution Amount ($)") +
scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
geom_vline(xintercept=c(as.Date("2022-04-01"), as.Date("2022-07-01"), as.Date("2022-10-01")), linetype = 2, color = "black", size = 0.1) +
geom_text(aes(as.Date("2022-04-01"), 0, label = "End Q1"), size= 3) +
geom_text(aes(as.Date("2022-07-01"), 0, label = "End Q2"), size= 3) +
geom_text(aes(as.Date("2022-10-01"), 0, label = "End Q3"), size= 3) +
theme_classic() +
theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## Warning: Removed 24 rows containing missing values (`geom_line()`).
Fig. 2 is the cumulative sum of all contributions from 1/1/22 - 10/19/22. You can see both parties spiked at the end of quarter deadlines - at the end of a disclosure period, each campaign has to publicly release their fundraising report for that quarter. Candidates tend to push to bring in as much money as they can so when their records are released they can show fundraising momentum. While both parties spike at EOQ deadlines, the Democrats clearly outperformed the Republicans after Q1. This could be in part due to the fact the Democrats rely more on individual contributions than Republicans - who tend to make up the shortfall in candidate fundraising by large spending from outside groups (https://www.washingtonpost.com/politics/2022/10/07/house-democrats-fundraising/).
# by day
final_receipt_date_party_day <- final_receipt_date %>%
group_by(Party, contribution_receipt_date) %>%
summarise(fundraised = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_day, aes(x = as.Date(contribution_receipt_date)), y = fundraised, color = Party) +
geom_line(aes(y = fundraised, color = Party)) +
scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
labs(title = "Fig. 3 Sum of Individual Contributions by Day for Federal Candidates \nof Each Party",
x = "Day of Contribution",
y = "Contribution Amount ($)") +
scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
geom_vline(xintercept=c(as.Date("2022-05-02"), as.Date("2022-06-24"), as.Date("2022-08-08")), linetype = 2, color = "black", size = 0.1) +
geom_text(aes(as.Date("2022-05-02"), 12000000, label = "Dobbs Leak"), size= 3) +
geom_text(aes(as.Date("2022-06-24"), 10000000, label = "Dobbs Decision"), size= 3) +
geom_text(aes(as.Date("2022-08-08"), 8000000, label = "Trump FBI Raid"), size= 3) +
theme_classic() +
theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Removed 24 rows containing missing values (`geom_line()`).
Fig. 3 shows the contributions each party brought in by day between 1/1/22 - 10/19/22. The largest spikes correspond with the EOQ deadlines as mentioned in Fig. 2. Also included are the dates of major political shocks - notably the Dobbs decision happened near the end of Q2. The Democrats led that spike in fundraising and then continued the daily lead throughout the next quarter. Also, GOP fundraising did not catch up after the FBI raid on Mar-a-Lago - which became a large part of the fundraising message from the GOP around that time.
# by week
final_receipt_date_party_week <- final_receipt_date %>%
mutate(week = week(as.Date(contribution_receipt_date))) %>%
group_by(Party, week) %>%
summarise(fundraised = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_week, aes(x = as.Date(paste(2022, week, 1, sep="-"), "%Y-%U-%u")), y = fundraised, color = Party) +
geom_line(aes(y = fundraised, color = Party)) +
scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
labs(title = "Fig. 4 Sum of Individual Contributions by Week for Federal Candidates \nof Each Party",
x = "Week of Contribution",
y = "Contribution Amount ($)") +
scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
geom_vline(xintercept=c(as.Date("2022-05-02"), as.Date("2022-06-24"), as.Date("2022-08-08")), linetype = 2, color = "black", size = 0.1) +
geom_text(aes(as.Date("2022-05-02"), 30000000, label = "Dobbs Leak"), size= 3) +
geom_text(aes(as.Date("2022-06-24"), 28000000, label = "Dobbs Decision"), size= 3) +
geom_text(aes(as.Date("2022-08-08"), 26000000, label = "Trump FBI Raid"), size= 3) +
theme_classic() +
theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Removed 10 rows containing missing values (`geom_line()`).
Fig. 4 shows the weekly contributions to each party between 1/1/22 - 10/19/22. Democrats clearly held a fundraising advantage after the Dobbs leak for the rest of the cycle. While most spikes correlate between parties (when one party has a good week, so does the other), there are a few key moments shown where Democrats achieved smaller spikes without seeing a similar trend among Republican candidate fundraising.
# look at only competitive districts
# competitive districts defined as districts with + or - 10 ppt difference in 2020 presidential vote share (DEM - GOP)
# by day
final_receipt_date_party_day_competitive <- final_receipt_date %>%
filter(abs(`Dem-Rep.2020`) <= .1) %>%
group_by(Party, contribution_receipt_date) %>%
summarise(fundraised = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_day_competitive, aes(x = as.Date(contribution_receipt_date)), y = fundraised, color = Party) +
geom_line(aes(y = fundraised, color = Party)) +
scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
labs(title = "Fig. 5 Sum of Individual Contributions by Day for Federal Candidates \nof Each Party Running in Competitive Districts",
x = "Day of Contribution",
y = "Contribution Amount ($)") +
scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
geom_vline(xintercept=c(as.Date("2022-05-02"), as.Date("2022-06-24"), as.Date("2022-08-08")), linetype = 2, color = "black", size = 0.1) +
geom_text(aes(as.Date("2022-05-02"), 7400000, label = "Dobbs Leak"), size= 3) +
geom_text(aes(as.Date("2022-06-24"), 7000000, label = "Dobbs Decision"), size= 3) +
geom_text(aes(as.Date("2022-08-08"), 6400000, label = "Trump FBI Raid"), size= 3) +
theme_classic() +
theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Removed 14 rows containing missing values (`geom_line()`).
Fig. 5 shows the contributions to candidates in competitive districts by party and day between 1/1/22 - 10/19/22. Competitive districts were determined based on the difference in vote share in the 2020 presidential election for each party, excluding any districts that were not within 10% points. Especially as the election came closer, the size of the fundraising gap in competitive districts between parties is more distinct. Democrats held the advantage in daily fundraising from the end of May forward. However, due to the difference in the way the Democrats and Republicans fundraise, the discrepancy can be explained by Republicans relying more on outside spending. More analysis and data on PAC contributions is needed to understand this dynamic.
# look at only competitive districts
# competitive districts defined as districts with + or - 10 ppt difference in 2020 presidential vote share (DEM - GOP)
# by week
final_receipt_date_party_week_competitive <- final_receipt_date %>%
filter(abs(`Dem-Rep.2020`) <= .1) %>%
mutate(week = week(as.Date(contribution_receipt_date))) %>%
group_by(Party, week) %>%
summarise(fundraised = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
ggplot(final_receipt_date_party_week_competitive, aes(x = as.Date(paste(2022, week, 1, sep="-"), "%Y-%U-%u")), y = fundraised, color = Party) +
geom_line(aes(y = fundraised, color = Party)) +
scale_x_date(limits = as.Date(c("2022-01-01", "2022-10-19"))) +
labs(title = "Fig. 6 Sum of Individual Contributions by Week for Federal Candidates of \nEach Party Running in Competitive Districts",
x = "Week of Contribution",
y = "Contribution Amount ($)") +
scale_color_manual(values = c("GOP" = "red", "DEM" = "blue")) +
geom_vline(xintercept=c(as.Date("2022-05-02"), as.Date("2022-06-24"), as.Date("2022-08-08")), linetype = 2, color = "black", size = 0.1) +
geom_text(aes(as.Date("2022-05-02"), 30000000, label = "Dobbs Leak"), size= 3) +
geom_text(aes(as.Date("2022-06-24"), 28000000, label = "Dobbs Decision"), size= 3) +
geom_text(aes(as.Date("2022-08-08"), 26000000, label = "Trump FBI Raid"), size= 3) +
theme_classic() +
theme(panel.grid.major.y = element_line(color = "grey", size = 0.1))
## Warning: Removed 7 rows containing missing values (`geom_line()`).
Fig. 6 shows the contributions to candidates in competitive districts by party and week between 1/1/22 - 10/19/22. When broken down to just competitive districts, the 2nd and 3rd EOQ spikes show a more distinctive gap between Dem and GOP fundraising - Democrats dominated individual contributions in these districts.
# show the top fifteen candidates who fundraised the most for each party at each level of office
# House Democrats
final_top_house_dems <- final_house %>%
filter(Party == "DEM") %>%
top_n(15, total_fundraised_sum)
ggplot(final_top_house_dems, aes(x = total_fundraised_sum, y = reorder(Candidate, total_fundraised_sum), fill = total_fundraised_sum)) +
geom_bar(stat="identity") +
geom_text(aes(label=round(total_fundraised_sum / 1000000, 1)), hjust=-0.25) +
scale_fill_gradient(low="lightblue",high="blue") +
labs(title = "Fig. 7 Top Fundraisers Among House Democratic Candidates",
x = "Contribution Amount ($)",
y = "Candidate Name") +
guides(fill=guide_legend(title="Contribution Amount")) +
scale_x_continuous(labels=scales::comma, limits = c(0, 7000000)) +
theme_classic()
Fig. 7 shows the 15 highest Democratic fundraisers running for the House in the 2022 Midterms. Most of the candidates were in competitive districts - which explains the wider gap in fundraising by party in these districts in Fig 5. Katie Porter (CA-45), the highest fundraiser on this visual, is one of the most well known members of the Democratic caucus. She has a high national presence and a strong social media following - she also was in a competitive race with a pro-life Republican candidate (https://www.latimes.com/politics/story/2022-10-20/2022-california-midterm-election-porter-baugh-abortion-economy-environment). 4 out of the 15 candidates shown narrowly lost their races.
# show the top fifteen candidates who fundraised the most for each party at each level of office
# House Republicans
final_top_house_gop <- final_house %>%
filter(Party == "GOP") %>%
top_n(15, total_fundraised_sum)
ggplot(final_top_house_gop, aes(x = total_fundraised_sum, y = reorder(Candidate, total_fundraised_sum), fill = total_fundraised_sum)) +
geom_bar(stat="identity") +
geom_text(aes(label=round(total_fundraised_sum / 1000000, 1)), hjust=-0.25) +
scale_fill_gradient(low="lightcoral",high="red") +
labs(title = "Fig. 8 Top Fundraisers Among House Republican Candidates",
x = "Contribution Amount ($)",
y = "Candidate Name") +
guides(fill=guide_legend(title="Contribution Amount")) +
scale_x_continuous(labels=scales::comma, limits = c(0, 9500000)) +
theme_classic()
Fig. 8 shows the 15 highest Republican fundraisers running for the House in the 2022 Midterm Election. Most of the candidates shown maintain some sort of presence in the GOP mainstream - Kevin McCarthy, the minority leader, is most likely going to be speaker of the House in January, Elise Stefanik is a rising star in GOP ranks maneuvering from a moderate freshman Republican to ousting Liz Cheny for 3rd in GOP House Leadership, and Jim Jordan one of the most outspoken Trump loyalists. Harriet Hageman, the Trump Loyalist GOP challenger who defeated Cheney in her primary in Wyoming was the 5th highest fundraiser - despite never holding federal office. Interestingly, Republican candidates of color (James, Kim, Ciscomani, Steel…) were all leaders in GOP house candidate fundraising.
# show the top fifteen candidates who fundraised the most for each party at each level of office
# Senate Democrats
final_top_senate_dems <- final_senate %>%
filter(Party == "DEM") %>%
top_n(15, total_fundraised_sum)
ggplot(final_top_senate_dems, aes(x = total_fundraised_sum, y = reorder(Candidate, total_fundraised_sum), fill = total_fundraised_sum)) +
geom_bar(stat="identity") +
geom_text(aes(label=round(total_fundraised_sum / 1000000, 1)), hjust=-0.25) +
scale_fill_gradient(low="lightblue",high="blue") +
labs(title = "Fig. 9 Top Fundraisers Among Senate Democratic Candidates",
x = "Contribution Amount ($)",
y = "Candidate Name") +
guides(fill=guide_legend(title="Contribution Amount")) +
scale_x_continuous(labels=scales::comma, limits = c(0, 42000000)) +
theme_classic()
Fig. 9 shows the 15 highest Democratic fundraisers running for Senate in the 2022 Midterm Election. This graph shows almost exclusively senate seats that were at least slightly competitive. 5 out of the 15 candidates lost their election - most notably the 3rd highest fundraiser Val Demmings losing by 16 points to Marco Rubio in Florida. Meanwhile Mandala Barnes, the Democratic challenger in Wisconsin, lost by just 1 point, despite his opponent Ron Johnson being the highest Republican fundraiser this cycle (Fig 10). Each senate election by state has different factors that influence not only how much money comes in, but how much it affects the final vote count.
# show the top fifteen candidates who fundraised the most for each party at each level of office
# Senate Republicans
final_top_senate_gop <- final_senate %>%
filter(Party == "GOP") %>%
top_n(15, total_fundraised_sum)
ggplot(final_top_senate_gop, aes(x = total_fundraised_sum, y = reorder(Candidate, total_fundraised_sum), fill = total_fundraised_sum)) +
geom_bar(stat="identity") +
geom_text(aes(label=round(total_fundraised_sum / 1000000, 1)), hjust=-0.25) +
scale_fill_gradient(low="lightcoral",high="red") +
labs(title = "Fig. 10 Top Fundraisers Among Senate Republican Candidates",
x = "Contribution Amount ($)",
y = "Candidate Name") +
guides(fill=guide_legend(title="Contribution Amount")) +
scale_x_continuous(labels=scales::comma, limits = c(0, 18000000)) +
theme_classic()
Fig. 10 shows the 15 highest Republican fundraisers running for Senate in the 2022 Midterm Election. The top Republican fundraiser Ron Johnson raised less than half then the top 2 Democratic fundraisers, Kelly and Warnock - as Republicans rely more on Super Pacs than individual contributions. Johnson raised about $1.1m less than his opponent Mandela Barnes. It’s also interesting how large the fundraising discrepancy is between the candidates in extremely competitive districts. Walker, Oz, Vance, and Masters were all in battleground states from 2020 so we would expect their fundraising would be stronger. Vance, who raised the least of those 4 candidates, was the only one to win his Senate bid - despite Tim Ryan outraising him 3 to 1. It’s also interesting that two of the highest fundraisers, Murkowski and Tshibaka, were actually opponents in the Alaska Senate election - Murkowski, the incumbent who raised about 20-25% more, won.
# look at where contributions originated and went to by state and by party
final_receipt_date_contributor_state <- final_receipt_date %>%
group_by(Party, contributor_state) %>%
summarise(`Origin Contribution Amount` = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party'. You can override using the
## `.groups` argument.
final_receipt_date_candidate_state <- final_receipt_date %>%
group_by(Party, State, State_processed) %>%
summarise(`Destination Contribution Amount` = sum(as.double(contribution_receipt_amount)))
## `summarise()` has grouped output by 'Party', 'State'. You can override using
## the `.groups` argument.
# For Democratic candidates
joined_state_dem <- left_join(final_receipt_date_contributor_state, final_receipt_date_candidate_state, by = c("contributor_state" = "State_processed", "Party" = "Party")) %>%
filter(!is.na(State), Party == "DEM") %>%
top_n(10, `Destination Contribution Amount`)
joined_state_dem_m <- melt(joined_state_dem[,c('State',"Destination Contribution Amount","Origin Contribution Amount")],id.vars = 1)
ggplot(joined_state_dem_m, aes(x = value, y = reorder(State, value))) +
geom_bar(aes(fill = variable), stat="identity", position = "dodge") +
labs(title = "Fig. 11 Top 10 States by Fundraising Amount for Democratic Candidates",
x = "Contribution Amount ($)",
y = "State") +
scale_x_continuous(labels=scales::comma) +
guides(fill=guide_legend(title="")) +
theme_classic()
Fig. 11 shows the top 10 states that donated to Democratic candidates this cycle with the amount contributed by donors within the state in cyan and the amount raised by candidates running in the state in orange. California was by far the state with the most amount fundraised by donors living in the state, much in part due to the high population of the state as well as Democrats being viewed more favorably there. California also had relatively high fundraising numbers for candidates running in the state, but can most likely be attributed to the high number of house races in the state. States like Arizona, Georgia, Pennsylvania, Ohio, Nevada, and Wisconsin had relatively high fundraising amounts compared to the amount of donations that originated there, which makes sense given that these are battleground states with highly contested races.
# For Republican candidates
joined_state_dem <- left_join(final_receipt_date_contributor_state, final_receipt_date_candidate_state, by = c("contributor_state" = "State_processed", "Party" = "Party")) %>%
filter(!is.na(State), Party == "GOP") %>%
top_n(10, `Destination Contribution Amount`)
joined_state_dem_m <- melt(joined_state_dem[,c('State',"Destination Contribution Amount","Origin Contribution Amount")],id.vars = 1)
ggplot(joined_state_dem_m, aes(x = value, y = reorder(State, value))) +
geom_bar(aes(fill = variable), stat="identity", position = "dodge") +
labs(title = "Fig. 12 Top 10 States by Fundraising Amount for Republican Candidates",
x = "Contribution Amount ($)",
y = "State") +
scale_x_continuous(labels=scales::comma) +
guides(fill=guide_legend(title="")) +
theme_classic()
Fig. 12’s visual shows the top 10 states that donated to Republican candidates this cycle with the amount contributed by donors within the state in cyan, and the amount raised by candidates running in the state in orange. Republicans had high origin contribution amounts from populous states like Florida, California, and Texas. Similar to Democrats, Arizona, Georgia, Pennsylvania, Wisconsin, and Ohio saw a higher amount fundraised by candidates running in those states than fundraised from contributors living in those states.
This section presents the main results, such as (for example) stats and graphs that show relationships, model results and/or clustering, PCA, etc.
# build linear model for each office level for fundraising
# house of representatives
house_model <- glm(total_fundraised_sum ~ . - Candidate - State - State_processed - District - Party, data = final_house, family="gaussian")
summary(house_model)
##
## Call:
## glm(formula = total_fundraised_sum ~ . - Candidate - State -
## State_processed - District - Party, family = "gaussian",
## data = final_house)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1278536 -339166 -179777 107333 8296658
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80798 216706 0.373 0.7094
## GOP.Num 52628 70222 0.749 0.4538
## Gender.Num -25375 65429 -0.388 0.6983
## White.Num 96625 165687 0.583 0.5599
## Black.Num -80642 169146 -0.477 0.6337
## Asian.Num 300559 207802 1.446 0.1485
## Latino.Num 14499 174396 0.083 0.9338
## Middle_Eastern.Num -124492 383699 -0.324 0.7457
## Native_American.Num 541116 282529 1.915 0.0558 .
## Incumbent.Num -85585 95162 -0.899 0.3687
## Trump.Num 20790 103166 0.202 0.8403
## Party.Committee.Num 866306 122985 7.044 4.09e-12 ***
## Emily.s.List.Num 828938 132180 6.271 5.91e-10 ***
## Maggie.s.List.Num 214868 141303 1.521 0.1288
## Sanders.Num -95652 214979 -0.445 0.6565
## Renew.America.Num -292526 716622 -0.408 0.6832
## Winner.Num 458377 103335 4.436 1.05e-05 ***
## `2020.White.%` -11895 173154 -0.069 0.9452
## Pres_2020 24487 586056 0.042 0.9667
## Comp 237344 504165 0.471 0.6379
## `Dem-Rep.2020` 786932 521116 1.510 0.1314
## `Dem-Rep.Comp` -844300 533138 -1.584 0.1137
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 505639653427)
##
## Null deviance: 5.2962e+14 on 805 degrees of freedom
## Residual deviance: 3.9642e+14 on 784 degrees of freedom
## AIC: 24032
##
## Number of Fisher Scoring iterations: 2
In this model looking at House candidates, Party Committee and Emily’s List endorsements as well as win status were found to be statistically significant at 95% confidence in increasing fundraising for the candidate. Native American status is statistically significant at 90% confidence.
# plot basic visuals of the model
plot(house_model)
## Warning: not plotting observations with leverage one:
## 174
# build linear model for each office level for fundraising
# senate model
senate_model <- glm(total_fundraised_sum ~ . - Candidate - State_processed - State - Party, data = final_senate, family="gaussian")
summary(senate_model)
##
## Call:
## glm(formula = total_fundraised_sum ~ . - Candidate - State_processed -
## State - Party, family = "gaussian", data = final_senate)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -9866544 -4513949 -839941 2099391 24805139
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13604342 12534971 1.085 0.2834
## GOP.Num 1298602 4011008 0.324 0.7476
## Gender.Num 4267648 3285674 1.299 0.2005
## White.Num 1019774 6776886 0.150 0.8810
## Black.Num 6570202 7724269 0.851 0.3994
## Asian.Num -11733104 9244647 -1.269 0.2108
## Latino.Num 1635883 7041803 0.232 0.8173
## Middle_Eastern.Num 11049531 11244542 0.983 0.3309
## Native_American.Num -1160448 6688013 -0.174 0.8630
## Incumbent.Num 2355866 5673584 0.415 0.6799
## Trump.Num 3450534 3674620 0.939 0.3526
## Party.Committee.Num -5023172 6516587 -0.771 0.4447
## Emily.s.List.Num 11687898 5255787 2.224 0.0311 *
## Maggie.s.List.Num 4747322 7491348 0.634 0.5294
## Sanders.Num 13433738 9135195 1.471 0.1482
## Renew.America.Num 3601442 9469739 0.380 0.7055
## Winner.Num 7078693 4876333 1.452 0.1534
## `2020.White.%` -9565337 10368512 -0.923 0.3611
## Pres_2020 -17303592 47761403 -0.362 0.7188
## Comp -9512435 47423748 -0.201 0.8419
## `Dem-Rep.2020` 15356441 26154783 0.587 0.5600
## `Dem-Rep.Comp` -22094526 28462190 -0.776 0.4416
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 6.572411e+13)
##
## Null deviance: 4.8543e+15 on 67 degrees of freedom
## Residual deviance: 3.0233e+15 on 46 degrees of freedom
## AIC: 2375.9
##
## Number of Fisher Scoring iterations: 2
This model shows that among Senate candidates, being endorsed by Emily’s List was found to be the only statistically significant influence on fundraising, to the 95% confidence level.
# plot basic visuals of the model
plot(senate_model)
## Warning: not plotting observations with leverage one:
## 37, 40, 46, 59
# build linear model for each office level for fundraising
# competitive districts
competitive_model <- glm(total_fundraised_sum ~ . - Candidate - State - State_processed - Party, data = final_competitive, family="gaussian")
summary(competitive_model)
##
## Call:
## glm(formula = total_fundraised_sum ~ . - Candidate - State -
## State_processed - Party, family = "gaussian", data = final_competitive)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -12268489 -848881 76462 1035696 19165700
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3322511 6120918 0.543 0.5880
## GOP.Num 1101457 772087 1.427 0.1557
## Gender.Num -218150 810514 -0.269 0.7882
## White.Num 168242 1403072 0.120 0.9047
## Black.Num 1986476 1576572 1.260 0.2096
## Asian.Num 429797 2262050 0.190 0.8496
## Latino.Num -361048 1424548 -0.253 0.8003
## Middle_Eastern.Num -1150072 3879218 -0.296 0.7673
## Native_American.Num -832333 2619948 -0.318 0.7512
## Incumbent.Num 1395526 756064 1.846 0.0668 .
## Senator.Num 14382065 898347 16.009 <2e-16 ***
## Trump.Num -1116557 917100 -1.217 0.2253
## Party.Committee.Num 644896 744400 0.866 0.3877
## Emily.s.List.Num -44995 1136001 -0.040 0.9685
## Maggie.s.List.Num 147109 1277721 0.115 0.9085
## Sanders.Num 1679869 3620851 0.464 0.6433
## Renew.America.Num NA NA NA NA
## Winner.Num 1017024 770635 1.320 0.1889
## `2020.White.%` -2110329 1945772 -1.085 0.2798
## Pres_2020 3207249 15138129 0.212 0.8325
## Comp -8106584 10779905 -0.752 0.4532
## `Dem-Rep.2020` -1290337 7105548 -0.182 0.8561
## `Dem-Rep.Comp` 2403656 6137680 0.392 0.6959
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.147759e+13)
##
## Null deviance: 5.7186e+15 on 175 degrees of freedom
## Residual deviance: 1.7675e+15 on 154 degrees of freedom
## AIC: 5814.5
##
## Number of Fisher Scoring iterations: 2
Among competitive districts, only Senator status was found to be statistically significant with incumbency status having a p-value lower than 0.1, but not low enough to be significant at 95% confidence.
# plot basic visuals of the model
plot(competitive_model)
## Warning: not plotting observations with leverage one:
## 111, 125
# build logistic model for probability of winning
# house of representatives
house_model2 <- glm(Winner.Num ~ . - Candidate - State - State_processed - District - Party, data = final_house, family="binomial")
summary(house_model2)
##
## Call:
## glm(formula = Winner.Num ~ . - Candidate - State - State_processed -
## District - Party, family = "binomial", data = final_house)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.06874 -0.10642 0.00002 0.13882 2.33378
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.608e+01 2.512e+00 -6.402 1.54e-10 ***
## GOP.Num -2.235e+00 5.522e-01 -4.048 5.16e-05 ***
## Gender.Num 4.720e-01 5.037e-01 0.937 0.34865
## White.Num -1.045e+00 1.269e+00 -0.824 0.41016
## Black.Num -9.752e-01 1.365e+00 -0.715 0.47484
## Asian.Num -1.647e+00 1.517e+00 -1.085 0.27782
## Latino.Num 4.986e-01 1.288e+00 0.387 0.69878
## Middle_Eastern.Num 4.742e+00 1.175e+02 0.040 0.96780
## Native_American.Num -6.738e-01 1.745e+00 -0.386 0.69943
## Incumbent.Num 3.908e+00 5.004e-01 7.810 5.74e-15 ***
## Trump.Num -1.232e+00 7.787e-01 -1.582 0.11361
## Party.Committee.Num 4.212e-01 5.196e-01 0.811 0.41762
## Emily.s.List.Num 5.215e-01 7.338e-01 0.711 0.47727
## Maggie.s.List.Num -4.866e-01 9.277e-01 -0.525 0.59992
## Sanders.Num 1.559e+01 8.283e+02 0.019 0.98498
## Renew.America.Num 1.115e+01 3.956e+03 0.003 0.99775
## `2020.White.%` 1.951e+00 1.272e+00 1.534 0.12499
## Pres_2020 3.620e+01 6.687e+00 5.413 6.19e-08 ***
## Comp -6.388e+00 4.735e+00 -1.349 0.17734
## `Dem-Rep.2020` -6.615e+00 4.545e+00 -1.455 0.14556
## `Dem-Rep.Comp` 7.241e-01 4.130e+00 0.175 0.86083
## total_fundraised_sum 8.901e-07 2.873e-07 3.098 0.00195 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1117.33 on 805 degrees of freedom
## Residual deviance: 213.01 on 784 degrees of freedom
## AIC: 257.01
##
## Number of Fisher Scoring iterations: 16
This model shows what characteristics affect House candidate’s chances of winning. GOP candidates, incumbents, total fundraised and Pres. 2020 variables were found to be statistically significant at 95% confidence.
# plot basic visuals of the models
plot(house_model2)
## Warning: not plotting observations with leverage one:
## 174
# build logistic model for probability of winning
# senate
senate_model2 <- glm(Winner.Num ~ . - Candidate - State - State_processed - Party, data = final_senate, family="binomial")
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(senate_model2)
##
## Call:
## glm(formula = Winner.Num ~ . - Candidate - State - State_processed -
## Party, family = "binomial", data = final_senate)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.069e-05 -2.110e-08 -2.110e-08 2.110e-08 2.854e-05
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.581e+02 4.746e+07 0.000 1
## GOP.Num -2.207e+01 4.642e+05 0.000 1
## Gender.Num -1.592e+01 3.712e+05 0.000 1
## White.Num 3.628e+01 4.746e+07 0.000 1
## Black.Num 4.851e+01 4.746e+07 0.000 1
## Asian.Num -4.403e+01 5.376e+05 0.000 1
## Latino.Num 2.596e+01 4.746e+07 0.000 1
## Middle_Eastern.Num -8.345e+01 4.746e+07 0.000 1
## Native_American.Num -8.042e+01 8.184e+05 0.000 1
## Incumbent.Num 1.095e+02 4.622e+05 0.000 1
## Trump.Num 5.701e+01 5.792e+05 0.000 1
## Party.Committee.Num -7.959e+01 8.014e+05 0.000 1
## Emily.s.List.Num -3.516e+01 5.754e+05 0.000 1
## Maggie.s.List.Num -6.658e+01 4.746e+07 0.000 1
## Sanders.Num -4.922e+01 4.541e+05 0.000 1
## Renew.America.Num 3.361e+01 5.781e+05 0.000 1
## `2020.White.%` 2.972e+02 8.125e+05 0.000 1
## Pres_2020 4.701e+02 3.944e+06 0.000 1
## Comp -3.306e+01 2.317e+06 0.000 1
## `Dem-Rep.2020` -6.909e+02 1.990e+06 0.000 1
## `Dem-Rep.Comp` 8.288e+02 1.626e+06 0.001 1
## total_fundraised_sum 7.587e-07 1.150e-02 0.000 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9.4033e+01 on 67 degrees of freedom
## Residual deviance: 2.9417e-09 on 46 degrees of freedom
## AIC: 44
##
## Number of Fisher Scoring iterations: 25
This model among Senate candidates shows no variables were statistically significant in predicting winning outcome. This is most likely due in part to low N observations.
# plot basic visuals of the models
plot(senate_model2)
## Warning: not plotting observations with leverage one:
## 37, 40, 46, 59
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
# build logistic model for probability of winning
# competitive
competitive_model2 <- glm(Winner.Num ~ . - Candidate - State - State_processed - Party, data = final_competitive, family="binomial")
summary(competitive_model2)
##
## Call:
## glm(formula = Winner.Num ~ . - Candidate - State - State_processed -
## Party, family = "binomial", data = final_competitive)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.25859 -0.38923 0.00766 0.34084 2.13679
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.026e+01 7.622e+00 -5.283 1.27e-07 ***
## GOP.Num -2.606e+00 8.254e-01 -3.157 0.00159 **
## Gender.Num 8.466e-01 8.334e-01 1.016 0.30968
## White.Num 1.777e+00 1.434e+00 1.239 0.21520
## Black.Num 2.530e+00 1.653e+00 1.530 0.12590
## Asian.Num 2.546e+00 2.256e+00 1.128 0.25912
## Latino.Num 4.716e+00 1.588e+00 2.970 0.00298 **
## Middle_Eastern.Num -1.463e+01 2.400e+03 -0.006 0.99513
## Native_American.Num -5.690e-01 1.776e+00 -0.320 0.74872
## Incumbent.Num 3.769e+00 8.607e-01 4.379 1.19e-05 ***
## Senator.Num -5.456e-02 1.584e+00 -0.034 0.97253
## Trump.Num -1.170e+00 9.008e-01 -1.299 0.19408
## Party.Committee.Num -2.227e-01 6.703e-01 -0.332 0.73965
## Emily.s.List.Num 5.765e-01 1.102e+00 0.523 0.60093
## Maggie.s.List.Num 1.120e+00 1.276e+00 0.878 0.38015
## Sanders.Num -1.687e+01 2.400e+03 -0.007 0.99439
## Renew.America.Num NA NA NA NA
## `2020.White.%` 4.502e+00 2.047e+00 2.199 0.02787 *
## Pres_2020 6.032e+01 1.543e+01 3.909 9.26e-05 ***
## Comp 1.052e+01 1.174e+01 0.896 0.37026
## `Dem-Rep.2020` -5.359e+00 7.376e+00 -0.727 0.46750
## `Dem-Rep.Comp` 7.180e+00 6.257e+00 1.147 0.25120
## total_fundraised_sum 6.626e-08 9.857e-08 0.672 0.50143
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 243.99 on 175 degrees of freedom
## Residual deviance: 105.39 on 154 degrees of freedom
## AIC: 149.39
##
## Number of Fisher Scoring iterations: 15
Among competitive districts, incumbency status, presidential vote share in 2020, GOP status, Latino status, and percentage white in district, were found to be statistically significant at 95% confidence in predicting win.
# plot basic visuals of the model
plot(competitive_model2)
## Warning: not plotting observations with leverage one:
## 111, 125
# conduct principal component analysis for each data set
# for house
final_house_pca <-
as.data.frame(final_house) %>%
select(-Candidate, -State, -State_processed, -Party, -District) %T>%
pairs(.)
pca_house <- prcomp(x = final_house_pca,
scale. = TRUE)
summary(pca_house)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.9014 1.8470 1.3618 1.24252 1.17086 1.1010 1.04295
## Proportion of Variance 0.1643 0.1551 0.0843 0.07018 0.06231 0.0551 0.04944
## Cumulative Proportion 0.1643 0.3194 0.4037 0.47387 0.53618 0.5913 0.64072
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.02695 1.00068 0.99523 0.97272 0.91376 0.87600 0.77513
## Proportion of Variance 0.04794 0.04552 0.04502 0.04301 0.03795 0.03488 0.02731
## Cumulative Proportion 0.68866 0.73418 0.77920 0.82221 0.86016 0.89504 0.92235
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.6667 0.60989 0.5801 0.53518 0.39160 0.23269 0.22284
## Proportion of Variance 0.0202 0.01691 0.0153 0.01302 0.00697 0.00246 0.00226
## Cumulative Proportion 0.9425 0.95946 0.9748 0.98778 0.99475 0.99721 0.99947
## PC22
## Standard deviation 0.10845
## Proportion of Variance 0.00053
## Cumulative Proportion 1.00000
fviz_screeplot(pca_house)
All 10 dimensions explain portion of variance around 5% or greater with dimensions 9 and 10 being slightly lower.
# for senate
final_senate_pca <-
as.data.frame(final_senate) %>%
select(-Candidate, -State, -State_processed, -Party) %T>%
pairs(.)
pca_senate <- prcomp(x = final_senate_pca,
scale. = TRUE)
summary(pca_senate)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.0515 1.7211 1.47590 1.40218 1.24354 1.13690 1.0634
## Proportion of Variance 0.1913 0.1346 0.09901 0.08937 0.07029 0.05875 0.0514
## Cumulative Proportion 0.1913 0.3259 0.42495 0.51432 0.58461 0.64336 0.6948
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.05395 0.99130 0.95454 0.92009 0.85839 0.75357 0.65405
## Proportion of Variance 0.05049 0.04467 0.04142 0.03848 0.03349 0.02581 0.01944
## Cumulative Proportion 0.74526 0.78993 0.83134 0.86982 0.90332 0.92913 0.94857
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.61424 0.50633 0.45342 0.35165 0.27780 0.23145 0.1482
## Proportion of Variance 0.01715 0.01165 0.00935 0.00562 0.00351 0.00243 0.0010
## Cumulative Proportion 0.96572 0.97738 0.98672 0.99234 0.99585 0.99828 0.9993
## PC22
## Standard deviation 0.12572
## Proportion of Variance 0.00072
## Cumulative Proportion 1.00000
fviz_screeplot(pca_senate)
All 10 dimensions explain portion of variance around 5% or greater with dimensions 9 and 10 being slightly lower.
# for competitive districts
final_competitive_pca <-
as.data.frame(final_competitive) %>%
select(-Candidate, -State, -State_processed, -Party, -Renew.America.Num) %T>%
pairs(.)
pca_competitive <- prcomp(x = final_competitive_pca,
scale. = TRUE)
summary(pca_competitive)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.7436 1.5769 1.5038 1.37800 1.28231 1.2113 1.10870
## Proportion of Variance 0.1382 0.1130 0.1028 0.08631 0.07474 0.0667 0.05587
## Cumulative Proportion 0.1382 0.2512 0.3540 0.44032 0.51506 0.5818 0.63763
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.07136 1.00831 1.00065 0.92541 0.89189 0.80740 0.75311
## Proportion of Variance 0.05217 0.04621 0.04551 0.03893 0.03616 0.02963 0.02578
## Cumulative Proportion 0.68980 0.73601 0.78153 0.82045 0.85661 0.88624 0.91203
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.69393 0.57392 0.55322 0.47556 0.44225 0.40491 0.39307
## Proportion of Variance 0.02189 0.01497 0.01391 0.01028 0.00889 0.00745 0.00702
## Cumulative Proportion 0.93391 0.94889 0.96280 0.97308 0.98197 0.98942 0.99644
## PC22
## Standard deviation 0.27977
## Proportion of Variance 0.00356
## Cumulative Proportion 1.00000
fviz_screeplot(pca_competitive)
All 10 dimensions explain portion of variance around 5% or greater with dimensions 9 and 10 being slightly lower.
This section summarizes the results and may briefly outline advantages and limitations of the work presented.
Between the two parties, Democrats had a better fundraising performance from individual contributions in the 2022 Midterm Election. We saw a distinct gap between fundraising trends after abortion became a more salient political issue. Before the Dobbs leak on May 2nd, 2022, both parties were fundraising at about the same rate. For Democrats, while the Dobbs decision was a blow to their greater agenda, it was a watershed moment for their fundraising. Presidential approval rating was low, the global markets had been turbulent, and inflation was at record highs - enthusiasm for fundraising was low. However, Dobbs gave Democrats a unifying rallying call in their messaging that changed the trajectory of their fundraising - and in turn their chances to win.
A plurality of Democratic candidates ran on the message of Choice, and as was seen with the Kansas abortion referendum and Democratic over performance in key special elections in Alaska, Wisconsin and New York. In our results, an endorsement from Emily’s List, a pro-choice woman organization, significantly improved fundraising numbers for those candidates. High fundraising is indicative of a greater chance of winning, so every year candidate’s are striving to raise more than ever. Based on these trends, Democrats’ over performance in fundraising in the months leading up to the elections, also indicated the potential of over performing election expectations - despite the fundamentals being against them (Pres. Approval, Economy, and Party out-of-power midterm advantage).
One of the advantages of our work is being able to identify key candidate characteristics for the primaries, such as endorsements, fundraising numbers, district characteristics, and candidate demographics. It also allows us to look at both fundraising expectations and expectations of winning the election. Some of the disadvantages we found were that the FEC data did not include the final weeks of the election cycle. Also, the FEC only has reporting for individuals contributing over $200 and not all PAC money is included. The FEC API was difficult to incorporate into our analysis given its complications in processing the relevant information we needed. Even the FEC website interface itself was difficult to manage. If the API had worked, it would have streamlined our data collection process and made it easier to analyze updated FEC data.
Given the fact that the Dobbs decision happened near the end of a fundraising quarter, another disadvantage is that it is harder to clearly distinguish the full “Dobbs effect” from the general EOQ fundraising trends. If we had more time we would have further removed outliers and optimized our models, making sure we’re watching for variance - and using the results of PCA to fine tune our model. Some further research we could do includes: checking the vote margins for the 2022 election compared to past cycles, updating our analysis once FEC reports are finalized (data past Oct 19th), research on fundraising language and methods used by each candidate to see the prevalence of keywords related to major political shocks and to see which methods were more effective. The 2022 midterm election cycle was unique in political history and more research on this election cycle may provide meaningful insight into future elections.
Open Secrets (2003), Election Overview: Cost of the Election. Washington, DC: Center for Responsive Politics. https://www.opensecrets.org/elections-overview/cost-of-election
Open Secrets (2003), Election Overview: Did Money Win?. Washington, DC: Center for Responsive Politics. https://www.opensecrets.org/elections-overview/winning-vs-spending
New York Times (2022), House Election Tracker https://www.nytimes.com/news-event/2022-midterm-elections
New York Times (2022), Senate Election Tracker https://www.nytimes.com/news-event/2022-midterm-elections
Politico (2022), Read Justice Alito’s initial draft abortion opinion which would overturn Roe v. Wade https://www.politico.com/news/2022/05/02/read-justice-alito-initial-abortion-opinion-overturn-roe-v-wade-pdf-00029504
Supreme Court Dobbs Decision https://www.supremecourt.gov/opinions/21pdf/19-1392_6j37.pdf
United States (1997). U.S. Federal Election Commission FEC https://www.fec.gov/data
Dave’s Redistricting App https://davesredistricting.org/maps#home
Scherer, M. (2022). Democrats sound alarms about funding in battle for House majority. The Washington Post. https://www.washingtonpost.com/politics/2022/10/07/house-democrats-fundraising/
Fry, H. (2022). Where do rep. Katie Porter and Scott Baugh stand on abortion, inflation, immigration? Los Angeles Times. https://www.latimes.com/politics/story/2022-10-20/2022-california-midterm-election-porter-baugh-abortion-economy-environment
FiveThiryEight (2022) Primary Project 2022 https://github.com/fivethirtyeight/data/tree/master/primary-project-2022